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Abstract 

This paper is concerned with the computation of the principal components for a general 
tensor, known as the tensor principal component analysis (PCA) problem. We show that the 
general tensor PCA problem is reducible to its special case where the tensor in question is super- 
symmetric with an even degree. In that case, the tensor can be embedded into a symmetric 
matrix. We prove that if the tensor is rank-one, then the embedded matrix must be rank- 
one too, and vice versa. The tensor PCA problem can thus be solved by means of matrix 
optimization under a rank-one constraint, for which we propose two solution methods: (1) 
imposing a nuclear norm penalty in the objective to enforce a low-rank solution; (2) relaxing 
the rank-one constraint by Semidefinite Programming. Interestingly, our experiments show 
that both methods yield a rank-one solution with high probability, thereby solving the original 
tensor PCA problem to optimality with high probability. To further cope with the size of the 
resulting convex optimization models, we propose to use the alternating direction method of 
multipliers, which reduces significantly the computational efforts. Various extensions of the 
model are considered as well. 
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1 Introduction 



Principal component analysis (PCA) plays an important role in applications arising from data 
analysis, dimension reduction and bioinformatics etc. PCA finds a few linear combinations of the 
original variables. These linear combinations, which are called principal components (PCs), are 
orthogonal to each other and explain most of the variance of the data. PCs provide a powerful tool 
to compress data along the direction of maximum variance to reach the minimum information loss. 
Specifically, let ^ = (^i, . . . , ^m) be an m-dimensional random vector. Then for a given data matrix 
A £ iK'^x" which consists of n samples of the m variables, finding the PC that explains the largest 
variance of the variables (^i, . . . ,^m) corresponds to the following optimization problem: 

(X*,x*,y*) := min 11^ - Axy"^!!. (1) 

Problem (1) is well known to be reducible to computing the largest singular value (and correspond- 
ing singular vectors) of A, and can be equivalently formulated as: 



maxa; . 



s.t. 




(2) 



Note that the optimal value and the optimal solution of Problem (2) correspond to the largest 
eigenvalue and the corresponding eigenvector of the symmetric matrix ' 

Although the PCA and eigenvalue problem for matrix have been well studied in the literature, the 
research of PCA for tensors is still lacking. Nevertheless, the tensor PCA is of great importance in 
practice and has many applications in computer vision [46] , diffusion Magnetic Resonance Imaging 
(MRI) [15, 2, 41], quantum entanglement problem [22], spectral hypergraph theory [23] and higher- 
order Markov chains [29]. This is mainly because in real life we often encounter multidimensional 
data, such as images, video, range data and medical data such as CT and MRI. A color image can 
be considered as 3D data with row, column, color in each direction, while a color video sequence 
can be considered as 4D data, where time is the fourth dimension. Moreover, it turns out that it is 
more reasonable to treat the multidimensional data as a tensor instead of unfolding it into a matrix. 
For example, Wang and Ahuja [46] reported that the images obtained by tensor PCA technique 
have higher quality than that by matrix PCA. Similar to its matrix counterpart, the problem of 
finding the PC that explains the most variance of a tensor A (with degree m) can be formulated 
as: 

min — Ax^ (8) (X" • • • (8) x™"!! 

II ' II ^ 
s.t. A G R, = 1, i = 1, 2, . . . , m, 
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which is equivalent to 

max A{x^ , x'^ , ■ ■ ■ , x"^) 



„ „ (4) 
s.t. = 1, i = 1, 2, . . . , m, 



where denotes the outer product between vectors, in other words, 

m 
k=l 

The above problem is also known as the best rank-one approximation of tensor A; cf. [25, 24]. As 
we shall find out later in the paper, problem (4) can be reformulated as 

max T{x, x, - ■ ■ ,x) 

+ II II 1 
s.t. ||x|| = 1, 

where is a super-symmetric tensor. Problem (5) is NP-hard and is known as the maximum Z- 
eigenvalue problem. Note that a variety of eigenvalues and eigenvectors of a real symmetric tensor 
are introduced by Lim [30] and Qi [39] independently in 2005. Since then, various methods have 
been proposed to find the Z-eigenvalues [7, 40, 24, 25], which possibly correspond to some local 
optimums. In this paper, we shall focus on finding the global optimal solution of (5). 

Before proceeding let us introduce notations that will be used throughout the paper. We denote 
R" to be the n-dimensional Euclidean space. A tensor is a high dimensional array of real data, 
usually in calligraphic letter, and is denoted as ^ = (^iij2---im)nixn2x---xn„- The space where 
ni X ^2 X • • • X rim dimensional real- valued tensor resides is denoted by R_"iX"2x---xnm_ ■\Ye call 
A super-symmetric if ni = n2 = • • • = Um and Aij^i^-.-i^ is invariant under any permutation of 
(n,i2,...,im), i.e., Aii2-j™ = -47r(n,j2,-,jm)' where 7r(ii,i2,--- ,im) is any permutation of indices 
^2) • ■ ■ 5 ^m)- The space where n x n x • • • x n super-symmetric tensors reside is denoted by S""". 

V ' 

m 

Special cases of tensors are vector (m = 1) and matrix (m = 2), and tensors can also be seen as a 
long vector or a specially arranged matrix. For instance, the tensor space J{,"ixn2x - xnm ^^^^ ^^sq 
be seen as a matrix space R,("iX"-2x---xn„iJx(n™j+ixn„-^+2x---xnm)^ where the row is actually an mi 
array tensor space and the column is another m — mi array tensor space. Such connections between 
tensor and matrix re-arrangements will play an important role in this paper. As a convention in this 
paper, if there is no other specification we shall adhere to the Euclidean norm (i.e. the L2-norm) 
for vectors and tensors; in the latter case, the Euclidean norm is also known as the Frobenius norm. 



and is sometimes denoted as ||^||_f = i/y^i , Af , , . For a given matrix X, we use 

to denote the nuclear norm of X, which is the sum of all the singular values of X. Regarding the 

products, we use (8> to denote the outer product for tensors; that is, for Ai € j^nixn2x — xn^ 

A2 G R"-+iX"™+2X-X"™+f, Ai(S)A2 is in R"iX"2X-xn,„+^ ^-^j^ 
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The inner product between tensors Ai and A2 residing in the same space p{,"ix"2x---xnm jg (Jgnoted 

Al»A2= ^ (•^l)jii2---im(-^2)iii 



„1 ^2 



Under this light, a multi-hnear form A{x ,x , ...,x"^) can also be written in inner/outer products 
of tensors as 



il,--- ,im ii,---,im k=l 



m 

k 



In the subsequent analysis, for convenience we assume m to be even, i.e., m = 2d in (5), where d is 
a positive integer. As we will see later, this assumption is essentially not restrictive. Therefore, we 
will focus on the following problem of largest eigenvalue of an even order super-symmetric tensor: 

max J-{x, ■ ■ ■ ,x) 

(6) 

s.t. ||x|| = 1, 

where is a 2d-th order super-symmetric tensor. In particular, problem (6) can be equivalently 
written as 

max • X <^ ■ ■ ■ <^ X 

^ M ' (7) 
s.t. ||x|| = 1. 

Given any 2d-th. order super-symmetric tensor form J^, we call it rank one if = a (8) • • • ® a for 

2d 

some a G R". Moreover, the CP rank of is defined as follows. 

Definition 1.1 Suppose J- E S"^*^, the CP rank of J- denoted by rank(J^) is the smallest integer r 
such that 

r 

J^ = J2^i • • • ^ a\ , (8) 



1=1 



2d 



where a,- G R", A,- G R^. 



In the following, to simplify the notation, we denote K{n, d) = <^k = {ki, ■ ■ ■ , kn) G 
and 

<^]^2fej 22^2 - n^fen 2... 2... n...n- 



Y.kj = d 



By letting X = x ■ ■ ■ x we can further convert problem (7) into: 



2d 

max • X 



s.t. ^ n"'^' fe l'^l^fci 22*^2 ■■■n^fcn — 1) (q\ 

keK{n,d) ^ ' 

X e S""", rank(Af) = 1, 
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where the first equahty constraint is due to the fact that Yl yI>'-'^' k l 11^=1 ^ — II ^IP'^ = 1- 

The difficulty of the above problem lies in the dealing of the rank constraint rank(A') = 1. Not 
only the rank function itself is difficult to deal with, but also determining the rank of a specific 
given tensor is already a difficult task, which is NP-hard in general [20]. To give an impression of 
the difficulty involved in computing tensor ranks, note that there is a particular 9x9x9 tensor 
(cf. [26]) whose rank is only known to be in between 18 and 23. One way to deal with the difficulty 
is to convert the tensor optimization problem (9) into a matrix optimization problem. A typical 
matricization technique is the so-called mode-n matricization [25]. Roughly speaking, given a tensor 
A £ ]^"ix»^2x---x?im^ j-Qode-n matricization denoted by A{n) is to arrange n-th index of A to be 
the column index of resulting matrix and merge other indices of A into the row index of A{n). The 
so-called n-rank of A is the vector with m dimension such that its n-th component corresponds 
to the column rank of mode-n matrix A{n). The notion of n-rank has been widely used in the 
problems of tensor decomposition. Most recently, Liu et al. [33] and Gandy et al. [14] considered 
the low-n-rank tensor recovery problem, which were the first attempts to solve tensor low rank 
optimization problems. However, up till now, the relationship between the n-rank and CP rank is 
still unclear. Therefore, in the following we shall introduce a new scheme to fold a tensor into a 
matrix, where we use half of the indices of tensor to form the row index of a matrix and use the 
other half as the column index. Most importantly, in next section, we manage to establish some 
connection between the CP rank and the rank of the resulting unfolding matrix. 

Definition 1.2 For a given even-order tensor T G R"^"*, we define its square matrix rearrange- 
ment, denoted by M{F) G '^n'^'xn''- ^ following: 

M{^)ke ■= -^n---jd«d+i---«2d' ^ ^ h, ■ ■ ■ ,id, id+i, ■ ■ ■ ,i2d < n, 

where 

d 

k = "^{ij - l)n'^-i + 1, and £ 
i=i 

Similarly we introduce below the vectorization of a tensor. 
Definition 1.3 The vectorization, V{J-), of tensor T £ R*^™ is defined as 

where 

m 

k = ^{ij - l)n™~^' + 1, 1 < ii, • • • , < n. 



2d 

= - l)n''-^ + 1. 

j=d+i 



5 



In the same vein, we can convert a vector or a matrix with appropriate dimensions to a tensor. In 
other words, the inverse of the operators M and V can be defined in the same manner. In the 
following, we denote X = M{X), and so 

d 

Tr (X) = ^ Xe^e with £ = ^{ij - l)n'^-^ + 1. 
e j=i 

If we assume X to be of rank one, then 

h,--- ,id h,---,id 

In the above expression, (ii, • • • , id) is a subset of (1,2,..., n). Suppose that j appears kj times in 

n 

(ii, • • • , id) with j = 1, 2, . . . , n and Yl % = d. Then for a fixed outcome (ki, k2, - • • , kn), the total 
number of permutations (ii, • • • , id) to achieve such outcome is 

/ d\ /d-ki\ I'd-ki-k2\ fd-ki kn-i\ _ d\ 

Consequently, 

Tr(X)= Y '^il-il^ fp ^'^l2fcl22'=2...„2fcn- (10) 

In light of the above discussion, if we further denote F = M{T), then the objective in (9) is 

T»X = Ttl{FX), while the first constraint ^ T-,n^' ^ , Af^2fci22'=2 n^^n = 1 Tr (X) = 1. 

fce]K(n,d) 

The hard constraint in (9) is rank(Af) = 1. It is straightforward to see that if X is of rank one, 
then by letting X = x ^ ■ ■ ■ ® x and 3^ = a; (g) • • • (g) x, we have M{X) = V{y)V{y)^ , which is to 

2d d 

say that matrix M{X) is of rank one too. In the next section we shall continue to show that the 
other way around is also true. 

2 Equivalence Under the Rank- One Hypothesis 

We give the definition of proper tensor first. 

Definition 2.1 We call A € R"'* a proper n dimensional d-th order tensor if for any index k there 
exits a d-tuple {ii,--- 71 {k} such that Ai^^... / 0. 

We have the following lemma for a proper super-symmetric tensor. 
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Lemma 2.1 Suppose A G is a proper n dimensional d-th order tensor such that T = A® A G 
S"^"*, i.e., T is super- symmetric. If A is also super-symmetric, then the diagonal element Aj^d ^ 
for all 1 < k < n. 

Proof. For any given index k, suppose there is an m-tuple (ii • • -im) such tliat Ajj...j^fcd-m = 0. 
For any j,n+i ^ k, we have, 

a2 _ jr 

TT 

~ •'h---imk''-'^ii---imjm+ljm+ik''~"^~^ 

— A.^_, ud — m A.^^ ^ , ^ ^' , ^ h,d — m — 2 — 0. 

L\ Am"' Al l'mjva-\-lJvn.-\-l'^ 

This implies that 

= 0. V < m < £ < d, and jm+i, ■■■ ,ji^k. 

Therefore, if there is an index k with Ak^...^k = 0, then Aj^...j^k<'--t = for all < ^ < d and 
Ji) ■ ■ ■ iJi ^- This, combined with the assumption that ^ is a super-symmetric tensor, contradicts 
the fact that A is proper. □ 

We further prove the following proposition for proper tensor. 

Proposition 2.2 Suppose A G R"'' is a proper n dimensional d-th order tensor. The following 
two statements are equivalent: 

(i) ^ G S"'', and rank(^) = 1; 
(u) ^®^GS"''. 

Proof. We shall first show (i) =^ (ii). Suppose A G S'^'* with rank(^) = 1. Then there exits a 
vector a G R" such that A = a(S> • • • (gi a . Consequently, A^ A = a (8) a ig ) ■ ■ ■ a G S"^'' . 

d 2d 

Now we prove (ii) =^ (i). We denote T = A(E> A € S"^''. For any d-tuples {ii, • • • , id}, and one of 
its permutations {ji, ■ ■ ■ ,jii} G 7r(ii, • • • , i,^), it holds that 

= ^ii,--- ,id,ii,--- ,id ~^ -^ji,--- ,jd,ji,--- ,jd ~ '^'^h,--- ,id,ji,--- Jd ~ 0- 

where the last equality is due to the fact that is super-symmetric. Therefore, A is super- 
symmetric. In the following, we will prove that A G R"'' is a rank-one tensor by induction on d. 
It is evident that A is rank-one when d = I. Now we assume that A is rank-one when A G R"'' 
and we will show that the conclusion holds when the order of A is d. 
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For A € R'^ , we already proved that A is super-symmetric. Since A is proper, by Lemma 2.1 we 
know that A/^d for aU 1 < A; < n. We further observe that T € S"'^'' imphes 

for any (ii, • • • , id-i)- As a result, 

•^il---id-lj ~ A ' •^ii---id-ik-i ji ^1 i'^li ' ' ' l)- 

Now we can construct a vector b G R" with bj = ^ and a tensor A{k) G R'"'' ^ with 
-^(^)ir- id-i = Ai-id_ifc, such that 

^ = 6®^(A;), (11) 

and 

where the last equality is due to € g"^'', Qn the other hand, we notice that Ajd-i)^ ^ for all 
^ ^ j ^ n. This is because if this is not true then we would have 

= Ajd-lf^Af^d-lj = AjdAj^d, 

which contradicts the fact that A is proper. This means that all the diagonal elements of A{k) 

2d — 2 

are nonzero, implying that A{k) is a proper tensor. Moreover, A{k) A{k) G S" , because J- is 
super-symmetric. Thus by induction, we can find a vector a E R" such that 

A(k) = a (8) a (8) • • • (8) a . 

^ V ' 

d-i 

Plugging the above into (11), we get ^ = 6(8)a(8)a(8>---(8'a, and thus A is of rank one. □ 



The following proposition shows that the result in Proposition 2.2 holds without the assumption 
that the given tensor is proper. 

Proposition 2.3 Suppose A G R""*. Then the following two statements are equivalent: 

{{) ^ G S"'', and rank(^) = 1; 
(ii) ^0^gS"''. 

Proof, (i) =^ (ii) is readily implied by the same argument in that of Proposition 2.2. To show (ii) 
=^ (i), it suffices to prove the result when A is not proper. Without loss of generality, we assume 
k + 1, ...,n are all such indices that Aj^...j^ = if {ji, • • • , j^} 3 {£} with k + 1 < i < n. Now 
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introduce tensor B € R'^ such that Bi^^... = Ai^^...^i^ for any 1 < ii, - ■ ■ ,id 1^ k. Obviously B is 
proper. Moreover, since A® A G S" , it follows that B®B G S*" . Thanks to Proposition 2.2, 
there exists a vector h G R'"' such that B = b ^ ■ ■ ■ (E> b. Finally, by letting = (6^, 0, • • • ,0), we 

d n-k 

have A = a iSi ■ ■ ■ a. □ 

d 

Now we are ready to present the main result of this section. 

Theorem 2.4 Suppose X G S"'" and X = M{X) G R"''x"^ Then we have 



rank(A') = 1 rank(X) = 1. 



Proof. As remarked earlier, that rank(A') = 1 =^ rank(X) = 1 is evident. To see this, suppose 

rank(Af) = 1 and X = a; ig) • • • (g) x for some x G R". By constructing y = x (g • • • ig x , we have 

2d d 
X = M{X) = V{y)V{yY, which leads to rank(X) = 1. 

To prove the other implication, suppose that we have X G S"^'' and M{X) is of rank one, i.e. 
M{X) = yy^ for some vector y G R"" . Then X = V ^{y) (g V ^{y), which combined with 
Proposition 2.3 implies V~^{y) is supper-symmetric and of rank one. Thus there exits x G R" such 
that V~^{y) = X (g • • • (g X and X = x ig • • • g) x^ . □ 

d 2d 



3 A Nuclear Norm Penalty Approach 



According to Theorem 2.4, we know that a super-symmetric tensor is of rank one, if and only 
if its matrix correspondence obtained via the matricization operation defined in Definition 1.2, is 
also of rank one. As a result, we can reformulate Problem (9) equivalently as the following matrix 
optimization problem: 

max Tr {FX) 

s.t. TY(X) = 1, M-^^) G S""", (12) 
X G S*^"^"", rank(X) = 1, 

where X = M{X), F = M{F), and S"''^"'' denotes the set of n'^ x n'^ symmetric matrices. 
Note that the constraints M^^{X) G S"^'' requires the tensor correspondence of X to be super- 
symmetric, which essentially correspond to 0{n'^'^) linear equality constraints. The rank constraint 
rank(X) = 1 makes the problem intractable. In fact, Problem (12) is NP-hard in general, due to 
its equivalence to problem (6). 
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There have been a large amount of work that deal with the low-rank matrix optimization problems. 
Research in this area was mainly ignited by the recent emergence of compressed sensing [5, 8], 
matrix rank minimization and low-rank matrix completion problems [42, 4, 6]. The matrix rank 
minimization seeks a matrix with the lowest rank satisfying some linear constraints, i.e., 

min rank(X), s.t., C(X) = 6, (13) 

where b G HP and C : R"i^"2 _^ j^p jg linear operator. The works of [42, 4, 6] show that under 
certain randomness hypothesis of the linear operator C, the NP-hard problem (13) is equivalent to 
the following nuclear norm minimization problem, which is a convex programming problem, with 
high probability: 

min s.t., C{X) = b. (14) 

In other words, the optimal solution to the convex problem (14) is also the optimal solution to the 
original NP-hard problem (13). 

Motivated by the convex nuclear norm relaxation, one way to deal with the rank constraint in 
(12) is to introduce the nuclear norm term of X, which penalizes high-ranked X's, in the objective 
function. This yields the following convex optimization formulation: 

max Tt{FX) - p\\X\\^ 

s.t. Tt{X) = 1, M"\X) G S""", (15) 

X E s"''^"^ 

where p > is a penalty parameter. It is easy to see that if the optimal solution of (15) (denoted 
by X) is of rank one, then = Tr (X) = 1, which is a constant. In this case, the term — 

added to the objective function is a constant, which leads to the fact the solution is also optimal 
with the constraint that X is rank-one. In fact, Problem (15) is the convex relaxation of the 
following problem 

max Tt{FX)-p\\X\\^ 
s.t. Tt{X) = 1, M"\X) G S""", 
X G S"'^"', rank(X) = 1, 
which is equivalent to the original problem (12) since /3||X||* = pTr {X) = p. 

After solving the convex optimization problem (15) and obtaining the optimal solution X, if 
rank(X) = 1, we can find X such that M"^(X) = i (g) • • • (8) X, according to Theorem 2.4. In 

2d 

this case, x is the optimal solution to Problem (6). The original tensor PC A problem, or the 
Z-eigenvalue problem (6), is thus solved to optimality. 

Interestingly, we found from our extensive numerical tests that the optimal solution to Problem (15) 
is a rank-one matrix almost all the time. In the following, we will show this interesting phenomenon 
by some concrete examples. The first example is taken from [24]. 
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Example 3.1 We consider a super- symmetric tensor € S defined by 

Tnu = 0.2883, J^uu = -0.0031, J^ms = 0.1973, -F1122 = -0.2485, -F1123 = -0.2939, 
-^1133 = 0.3847, -F1222 = 0.2972, ^1223 = 0.1862, ^1233 = 0.09 19, -F1333 = -0.3619, 
7-2222 = 0.1241, J-2223 = -0.34 20, J-2233 = 0.2 1 27, J-2333 = 0.27 27, J-3333 = -0.30 54. 

We want to compute the largest Z-eigenvalue of F. 

Since the size of this tensor is small, we used CVX [19] to solve Problem (15) with F = M{T) and 
p = 10. It turned out that CVX produced a rank-one solution X = aa^ G R"^^^^^, where 

a = (0.4451, 0.1649, -0.4688, 0.1649, 0.0611, -0.1737, -0.4688, -0.1737, 0.4938)^. 

Thus we get the matrix correspondence of a by reshaping a into a square matrix A: 

0.4451 0.1649 -0.4688" 
0.1649 0.0611 -0.1737 . 
-0.4688 -0.1737 0.4938 _ 

It is easy to check that A is a rank-one matrix with the nonzero eigenvalue being 1. This fur- 
ther confirms our theory on the rank-one equivalence, i.e.. Theorem 2.4. The eigenvector that 
corresponds to the nonzero eigenvalue of A is given by 

i = (-0.6671,-0.2472,0.7027)^, 

which is the optimal solution to Problem (6). 

The next example is from a real Magnetic Resonance Imaging (MRI) application studied by Ghosh 
et al. in [15]. In [15], Ghosh et al. studied a fiber detection problem in diffusion Magnetic Resonance 
Imaging (MRI), where they tried to extract the geometric characteristics from an antipodally 
symmetric spherical function (ASSF), which can be described equivalently in the homogeneous 
polynomial basis constrained to the sphere. They showed that it is possible to extract the maxima 
and minima of an ASSF by computing the stationary points of a problem in the form of (6) with 
d = 2 and n = 4. 

Example 3.2 The objective function T{x^x,x,x) in this example is given by 

0.74694xf - 0.435103a;?X2 + 0.454945a;f x| + 0.0657818xia;| + : 
+ 0.370892;?X3 - 0.29883xf X2X3 - 0.795157xix|x3 + 0.139751x^X3 + 1.24733xf: 
+ 0.714359xiX2x| + 0.316264x^x1 - 0.397391xix| - 0.405544x2x| + 0.794869x 



A = [a(l : 3),a(4 : 6),a(7 : 9)] = 
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Again, we used CVX to solve problem (15) with F = M{T) and p = 10, and a rank-one solution 
was found with X = aaJ , with 

a = (0.0001, 0.0116, 0.0004, 0.0116, 0.9984, 0.0382, 0.0004, 0.0382, 0.0015)"^ . 

By reshaping vector a, we get the following expression of matrix A: 

0.0001 0.0116 0.0004" 
0.0116 0.9984 0.0382 . 
0.0004 0.0382 0.0015_ 

It is easy to check that A is a rank-one matrix with 1 being the nonzero eigenvalue. The eigenvector 
corresponding to the nonzero eigenvalue of A is given by 

X = (0.0116,0.9992,0.0382)"^, 

which is also the optimal solution to the original problem (6). 

We then conduct some numerical tests on randomly generated examples. We construct 4-th order 
tensor T with its components drawn randomly from i.i.d. standard normal distribution. The super- 
symmetric tensor T in the tensor PCA problem is obtained by symmetrizing T- All the numerical 
experiments in this paper were conducted on an Intel Core i5-2520M 2.5GHz computer with 4GB 
of RAM, and all the default settings of CVX were used for all the tests. We choose d = 2 and the 
dimension of J- in the tensor PCA problem from n = 3 to ?i = 9. We choose p = 10. For each 
n, we tested 100 random instances. In Table 1, we report the number of instances that produced 
rank-one solutions. We also report the average CPU time (in seconds) using CVX to solve the 
problems. 



n 


rank-1 


CPU 


3 


100 


0.21 


4 


100 


0.56 


5 


100 


1.31 


6 


100 


6.16 


7 


100 


47.84 


8 


100 


166.61 


9 


100 


703.82 



Table 1: Frequency of nuclear norm penalty problem (15) having a rank-one solution 

Table 1 shows that for these randomly created tensor PCA problems, the nuclear norm penalty 
problem (15) always gives a rank-one solution, and thus always solves the original problem (6) to 
optimality. 



A = [a(l : 3),a(4 : 6),a(7 : 9)] = 
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4 Semidefinite Programming Relaxation 



In this section, we study another convex relaxation for Problem (12). Note that the constraint 

XGS"'^'^',rank(X) = l 

in (12) actually implies that X is positive semidefinite. To get a tractable convex problem, we drop 
the rank constraint and impose a semidefinite constraint to (12) and consider the following SDP 
relaxation: 

{SDR) max Tr (FX) 

s.t. Tr{X) = l, (16) 

Remark that replacing the rank-one constraint by SDP constraint is by now a common and standard 
practice; see, e.g., [1, 17, 45]. Next theorem shows that the SDP relaxation (16) is actually closely 
related to the nuclear norm penalty problem (15). 

Theorem 4.1 Let Xg^j^ and Xp^p{p) be the optimal solutions of problems (16) and (15) re- 
spectively. Suppose Eig'^{X) and Eig~{X) are the summations of nonnegative eigenvalues and 
negative eigenvalues of X respectively, i.e., 

Eig+{X):= ^iW' Eig-{X):= X,{X). 

i:A,(.Y)>0 i:\i{X)<0 

It holds that 

2{p - v) \Eig~{X*p^p{p))\ <v-Fo, 
where Fq := max J-^2d and v is the optimal value of the following optimization problem 

l<i<n 

max Tr (FX) 

s.t. = 1, (17) 

As a result, hm Tr(FX*^p(p)) = Ti:{FX*p,p). 

Proof. Observe that Mj e^ • • • e' ), where e* is the i-th unit vector, is a feasible solution for 

2d 

problem (15) with objective value Tpd — p for all 1 < z < n. Moreover, by denoting r{p) = 
\Eig-{Xf,j^p{p))\, we have 

\\X*PMpip)\U = Eig+iX*p^p{p)) + \Eig-{X*p^pip))\ 

= {Eig+{X*pj,p{p)) + Eig-{X*p^p{p))) + 2 \Eig- {X*pj,p{p))\ 
= l + 2r(p). 
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Since Xp^p{p) is optimal to problem (15), we have 

TY {FX*pj,p{p)) - p (1 + 2r(p)) > max F,2, - p > Fo - p. (18) 

l<i<ri 

Moreover, since Xpj^p{p)/\\Xpj^p{p)\\^: is feasible to problem (17), we have 

Tr{FX*p^p{p)) < \\X*p^pip)\Uv = il + 2r{p))v. (19) 
Combining (19) and (18) yields 

2{p-v)r{p)<v-Fo. (20) 

Notice that \\X\\^, = 1 implies ||^||oo is bounded for all feasible X G g^'^x'T-''^ where ||^||oo denotes 
the largest entry of X in magnitude. Thus the set {Xpj^p{p) | p > 0} is bounded. Let Xpj^p be 
one cluster point of sequence {Xpj^p{p) \ p > 0}. By taking the limit p — +oo in (20), we have 
r(p) — and thus X*p^p >z 0. Consequently, X*p^p is a feasible solution to problem (16) and 
Tr (FX^jjp) > Tr (FXpj^p). On the other hand, it is easy to check that for any < pi < p2, 

Tt{FX*sdr) < Tr:{FX*p^p{p2)) < Tt {FX*p^p{pi)), 
which implies Tr (FX^^^) < TT{FX*pj^p). Therefore, lim Tr {FX*pj^p{p)) = Tv{FX*pj^p) = 



Theorem (4.1) shows that when p goes to infinity in (15), the optimal solution of the nuclear norm 
penalty problem (15) converges to the optimal solution of the SDP relaxation (16). As we have 
shown in Table 1 that the nuclear norm penalty problem (15) returns rank-one solutions for all the 
randomly created tensor PCA problems that we tested, it is expected that the SDP relaxation (16) 
will also give rank-one solutions with high probability. In fact, this is indeed the case as shown 
through the numerical results in Table 2. As in Table 1, we tested 100 random instances for each 
n. In Table 2, we report the number of instances that produced rank-one solutions for d = 2. We 
also report the average CPU time (in seconds) using CVX to solve the problems. As we see from 
Table 2, for these randomly created tensor PCA problems, the SDP relaxation (16) always gives a 
rank-one solution, and thus always solves the original problem (6) to optimality. 



5 Alternating Direction Method of Multipliers 



The computational times reported in Tables 1 and 2 suggest that it can be time consuming to solve 
the convex problems (15) and (16) when the problem size is large (especially for the nuclear norm 
penalty problem (15)). In this section, we propose an alternating direction method of multipliers 
(ADMM) for solving (15) and (16) that fully takes advantage of the structures. ADMM is closely 
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n 


rank-1 


CPU 


3 


100 


0.14 


4 


100 


0.25 


5 


100 


0.55 


6 


100 


1.16 


7 


100 


2.37 


8 


100 


4.82 


9 


100 


8.89 



Table 2: Frequency of SDP relaxation (16) having a rank-one solution 

related to some operator-splitting methods, known as Douglas-Rachford and Peaceman-Rachford 
methods, that were proposed in 1950s for solving variational problems arising from PDEs (see 
[9, 38]). These operator-splitting methods were extensively studied later in the literature for finding 
the zeros of the sum of monotone operators and for solving convex optimization problems (see 
[32, 12, 16, 10, 11]). The ADMM we will study in this section was shown to be equivalent to 
the Douglas-Rachford operator-splitting method applied to convex optimization problem (see [13]). 
ADMM was revisited recently as it was found to be very efficient for many sparse and low-rank 
optimization problems arising from the recent emergence of compressed sensing [49], compressive 
imaging [47, 18], robust PCA [44], sparse inverse covariance selection [50, 43], sparse PCA [34] and 
SDP [48] etc. For a more complete discussion and list of references on ADMM, we refer to the 
recent survey paper by Boyd et al. [3] and the references therein. 

Generally speaking, ADMM solves the following convex optimization problem, 

mina;6Rn^j/6Rp f{x)+g{y) 

s.t. Ax + By = b (21) 

2; G C, y €V, 

where / and g are convex functions, A G R™^", B € R™^^, b G R™", C and D are some simple 
convex sets. A typical iteration of ADMM for solving (21) can be described as follows: 

x''+^ := argmin^.gc C^{x,y^;\^) 
< y^+^ := argmiUj^g^ £^(x'^+\ y; A'=) (22) 

where the augmented Lagrangian function £^(x,y;A) is defined as 

£^(x, y- X) := f{x) + g{y) - (A, Ax + By-h) + ^\\Ax + By - bf, 

A is the Lagrange multiplier and ;U > is a penalty parameter. The following theorem gives the 
global convergence of (22) for solving (21), and this has been well studied in the literature (see, 
e.g., [12, 10]). 
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Theorem 5.1 Assume both A and B are of full column rank, the sequence {{x^ ,\^)^ generated 
by (22) globally converges to a pair of primal and dual optimal solutions {x*,y*) and A* of (21) 
from any starting point. 

Because both the nuclear norm penalty problem (15) and SDP relaxation (16) can be rewritten as 
the form of (21), we can apply ADMM to solve them. 

5.1 ADMM for Nuclear Penalty Problem (15) 

Note that the nuclear norm penalty problem (15) can be rewritten equivalently as 

min -Tr (Fy) + /3||y||* 

s.t. X -Y = 0, (23) 
X €C, 

where C := {X £ S"''^"'' | Tr {X) = 1, M^\X) G S""*}. A typical iteration of ADMM for solving 

(23) can be described as 

' j^fc+i argminxgc-Tr(Fy^■) + p||y'=||*-(A^A:-y'=) + ^||X-y'=||| 

< y'^+i := argmin -Tr(Fy)+p||y||*-(A^,X^+i-y) + ^||X^'+i-y|||, (24) 
^ ^k+i ._ _ (j^fc+i _ yfc+i)/^, 

where A is the Lagrange multiplier associated with the equality constraint in (23) and /x > is a 
penalty parameter. Following Theorem 5.1, we know that the sequence {(X'^, y^. A'"')} generated 
by (24) globally converges to a pair of primal and dual optimal solutions {X* ,Y*) and A* of (23) 
from any starting point. 

Next we show that the two subproblems in (24) are both easy to solve. The first subproblem in 

(24) can be equivalently written as 

X'^+i := argmin -\\X - (Y^ + nA^)\\l, (25) 
xgc 2 

i.e., the solution of the first subproblem in (24) corresponds to the projection of Y^ + fiA^ onto 
convex set C. We will elaborate how to compute this projection in Section 5.2. 

The second subproblem in (24) can be reduced to: 

y*^+^= argmin /xp||y||* + -||y - (A^'+i - /i(A'= - F))|||. (26) 
Y 2 

This problem is known to have a closed-form solution that is given by the following so-called matrix 
shrinkage operation (see, e.g., [35]): 

y'^+i := C/Diag (max{fT - fip,0})V^ , 

where f/Diag {cr)V~^ is the singular value decomposition of matrix X^~^^ — iJ,[A^ — F). 
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5.2 The Projection 



In this subsection, we study how to solve (25), i.e., how to compute the following projection for 
any given matrix Z S g"-'*x"'': 

min \X - Z\\l 

s.t. Tt{X) = 1, (27) 
For the sake of discussion, in the following we consider the equivalent tensor representation of (27): 



min \\X — Z\^p 



s.t. ^ TT"^' ;j.i'^i2fei22fe2...n.2'=" — (28) 
fee]K(n,d) ^ ' 

where X = M~^{X), Z = M~^{Z), and the equality constraint is due to (10). Now we denote 
index set 

I = {(ii • • • i2d) G vr(l2'=i • • • n2^") | A: = (fci, • • • , A:„) G K(n, d)} . 
Then the first-order optimality conditions of Problem (28) imply 

2 \ \Tr{ii ■ ■ ■ i2d)\ Xi^...i^^ - % -i2d =0' if (ii • • • i2d) 1, 

ii---i2dG'r(n---«2d) / 

rpi^^7^^Af^2fei...„2fe„ - X] ^ii-i2d ~ ^ W^^'^ikiV = 0' otherwise. 

' ii-i2de^(l2'=l-n.2'=") / ' 

Denote ^ to be the super-symmetric counterpart of tensor Z, i.e. 



. . ^ ,. . , k(n •••i2d)| 

and a{k,d) := ( *''^^(fc ) ! ) / ( ^^%k ■ ) ! ) • Then due to the first-order optimality conditions of (28), 
the optimal solution X* of Problem (28) satisfies 



-^ii-ha = ^il-hd^ if (^1 • • • ^2d) I, 

'^r2fei ...„2fe„ = I Oi{k, d) + ii2fei ...„2fe„ , otherwise . 



(29) 

Multiplying the second equality of (29) by prn^'^l', -., and summing the resulting equality over all 

1 lj=i y'^ji- 

k = {ki, - ■ ■ ,kn) yield 

^ {d)l _X ^ {d)l ^ {dy. ^ 
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It remains to determine A. Noticing that X* is a feasible solution for problem (28), we have 

^ n'"-ife)! ^i^^i---n^^" = 1. As a result, 

keK{n,d) ^ 

and thus we derived X* and X* = M{X*) as the desired optimal solution for (27). 

5.3 ADMM for SDP Relaxation (16) 

Note that the SDP relaxation problem (16) can be formulated as 

min — Tr (FY) 

s.t. Tr(X) = l, M~1(X)gS"'^ (30) 

x-Y = o, y^o. 

A typical iteration of ADMM for solving (30) is 

argmin^gc -Tr {FY'') - {A'',X - Y'') + ^\\X - Y''\\'j, 
argminy^o -Tr (-F'^') - (A*^,X^+i - Y) + j^\\X''+^ - Y\\l (31) 
A^- (X^+i -yfc+i)/^, 

where > is a penalty parameter. Following Theorem 5.1, we know that the sequence {(X'^, y'^, A'^)} 
generated by (31) globally converges to a pair of primal and dual optimal solutions {X*,Y*) and 
A* of (30) from any starting point. 

It is easy to check that the two subproblems in (31) are both relatively easy to solve. Specifically, 
the solution of the first subproblem in (31) corresponds to the projection of Y'^ + //A'^ onto C. The 
solution of the second problem in (31) corresponds to the projection of X'''^^ + fiF — /iA'^' onto the 
positive semidefinite cone y ^ 0, i.e., 

y'^+i := C/Diag(max{a,0});7^, 

where C/Diag {(t)U^ is the eigenvalue decomposition of matrix X''^^ + fxF — fih!'. 




6 Numerical Results 

6.1 The ADMM for Convex Programs (15) and (16) 

In this subsection, we report the results on using ADMM (24) to solve the nuclear norm penalty 
problem (15) and ADMM (31) to solve the SDP relaxation (16). For the nuclear norm penalty 
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problem (15), we choose p = 10. For ADMM, we choose /x = 0.5 and we terminate the algorithms 
whenever 

II \-k vk—l II 

\\x^-^y • 

We shall compare ADMM and CVX for solving (15) and (16), using the default solver of CVX - 
SeDuMi version 1.2. We report in Table 3 the results on randomly created problems with d = 2 
and n = 7, 8, 9, 10. For each pair of d and n, we test ten randomly created examples. In Table 
3, we use 'Inst.' to denote the number of the instance. We use 'Sol.Dif.' to denote the relative 
difference of the solutions obtained by ADMM and CVX, i.e., Sol.Dif. = W^admm-XcvxWf ^nd we 
use 'Val.Dif.' to denote the relative difference of the objective values obtained by ADMM and CVX, 
i.e., Val.Dif. = '"^i^j^jJ^^Jii' - We use Tad mm and Tcvx to denote the CPU times (in seconds) of 
ADMM and CVX, respectively. From Table 3 we see that, ADMM produced comparable solutions 
compared to CVX; however, ADMM were much faster than CVX, i.e., the interior point solver, 
especially for nuclear norm penalty problem (15). Note that when n = 10, ADMM was about 500 
times faster than CVX for solving (15), and was about 8 times faster for solving (16). 

In Table 4, we report the results on larger problems, i.e., n = 14,16,18,20. Because it becomes 
time consuming to use CVX to solve the nuclear norm penalty problem (15) (our numerical tests 
showed that it took more than three hours to solve one instance of (15) for n = 11 using CVX), we 
compare the solution quality and objective value of the solution generated by ADMM for solving 
(15) with solution generated by CVX for solving SDP problem (16). From Table 4 we see that, the 
nuclear norm penalty problem (15) and the SDP problem (16) indeed produce the same solution 
as they are both close enough to the solution produced by CVX. We also see that using ADMM to 
solve (15) and (16) were much faster than using CVX to solve (16). 

6.2 Comparison with SOS 

Based on the results of the above tests, we may conclude that it is most efficient to solve the SDP 
relaxation by ADMM. In this subsection, we compare this approach with a competing method 
based on the Sum of Squares (SOS) approach (Lasserre [27, 28] and Parrilo [36, 37]), which can 
solve any general polynomial problems to any given accuracy. However, the SOS approach requires 
to solve a sequence of (possibly large) Semidefinite Programs, which limits the applicability of the 
method to solve large size problems. Henrion et al. [21] developed a specialized Matlab toolbox 
known as GloptiPoly 3 based on SOS approach, which will be used in our test. 

In Table 5 we report the results using ADMM to solve SDP relaxation of PCA problem and compare 
them with the results of applying the SOS method for the same problem. We use 'Val.' to denote 
the objective value of the solution, 'Status' to denote optimal status of GloptiPoly 3, i.e.. Status = 1 
means GloptiPoly 3 successfully identified the optimality of current solution, 'Sol.R.' to denote the 
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solution rank returned by SDP relaxation and thanks to the previous discussion 'Sol.R.=l' means 
the current solution is already optimal. From Table 5, we see that GloptiPoly 3 is very time 
consuming compared with our ADMM approach. Note that when n = 20, our ADMM was about 
30 times faster than GloptiPoly 3. Moreover, for some instances GloptiPoly 3 cannot identify the 
optimality even though the current solution is actually already optimal (see instance 5 with n = 16 
and instance 10 with n = 18). 

7 Extensions 

In this section, we show how to extend the results in the previous sections for super-symmetric 
tensor PCA problem to tensors that are not super-symmetric. 

7.1 Bi-quadratic tensor PCA 

A closely related problem to the tensor PCA problem (6) is the following bi-quadratic PCA problem: 

max Q{x,y,x,y) 

s.t. X e R",||x|| = 1, (32) 
yGR-,||y|| = 1, 

where ^ is a partial- symmetric tensor defined as follows. 

Definition 7.1 A 4-th order tensor Q € R,(™-"-)^ is called partial- symmetric if Gijke = Gkjie = 
Giikj,'^i,j,k,£. The space of all A-th order partial- symmetric tensor is denoted by "^C™"-)^. 

Various approximation algorithms for bi-quadratic PCA problem have been studied in [31]. Prob- 
lem (32) arises from the strong ellipticity condition problem in solid mechanics and the entanglement 
problem in quantum physics; see [31] for more applications of bi-quadratic PCA problem. 

It is easy to check that for given vectors a G R" and b G R'", a ^ b ^ a ^ b G '^(™-")^_ Moreover, 
we say a partial symmetric tensor G is of rank one iiQ = a<^b(Sia<Sib for some vectors a and b. 

Since Tr {xy~^yx~^) = x~^xy~^y = 1, by letting X = x0y0x^y, problem (32) is equivalent to 

max Q • X 

s.t. y~] Xjjij — 1, 

X e S rank(Af) = 1. 
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In the following, we group variables x and y together and treat x ® y as a, long vector by stacking 
its columns. Denote X = iW(Af) and G = M{Q). Then, we end up with a matrix version of the 
above tensor problem: 

max Ty{GX) 

s.t. Tr(X) = l, X^O, (33) 
M'^{X) e '^('"")', rank(X) = 1. 

As it turns out, the rank-one equivalence theorem can be extended to the partial symmetric tensors. 
Therefore the above two problems are actually equivalent. 

Theorem 7.1 Suppose A is an n x m dimensional matrix. Then the following two statements are 
equivalent: 

(i) rank(A) = 1; 

(ii) A®A G 

In other words, suppose T G "^("^")^^ then rank(J^) = 1 <;=^ rank(F) = 1, where F = M{F). 

Proof, (i) =^ (ii) is obvious. Suppose rank(A) = 1, say A = ab^ for some a G R" and b G R™. 
Then Q = A®A = a®b®a®h\s partial-symmetric. 

Conversely, suppose Q = A ® A ^ '^(''"")^. Without loss of generality, we can assume A to be 
proper (otherwise we can find a proper submatrix of A, and the whole proof goes through as well). 
Since ^ is a proper matrix, it cannot happen that Ajj = for all 1 < j < n. Otherwise, 

where the second equality is due to Q is partial-symmetric. For fixed j, we cannot have Aji = for 
all i 7^ J or Aji = for all i ^ j, which combined with Ajj = contradicts the properness of A. So 
we must have i,i ^ j such that A^j, Aji ^ 0. However, in this case, 

= Ajj An = Qjju = Qijji = AijAji / 

giving rise to a contradiction. Therefore in the following, we assume A^k 7^ for some index k. 
Again since Q G '^(™")^, 

AkjAii^ = Qkjik ~ Gijkk ~ AijAj^f^. 



Consequently, An. = Aij for any 1 < i < n implies A is of rank one. □ 
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By using the similar argument in Theorem 4.1, we can show that the following SDP relaxation 
of (33) has a good chance to get a low rank solution. 

max Tr(GX) 

s.t. Tr{X) = l, XhO, (34) 

Therefore, we used the same ADMM to solve (34). The frequency of returning rank-one solutions 
for randomly created examples is reported in Table 6. As in Table 1 and Table 2, we tested 
100 random instances for each (n, m) and report the number of instances that produced rank-one 
solutions. We also report the average CPU time (in seconds) using ADMM to solve the problems. 
Table 6 shows that the SDP relaxation (34) can give a rank-one solution for most randomly created 
instances, and thus can solve the original problem (32) to optimality with high probability. 

7.2 Tri-linear tensor PCA 

Now let us consider a highly non-symmetric case: tri-linear PCA. 

max F{x, y, z) 
s.t. X G R", ||x 

z G R^, llz 

where T G R"^™^^ is any 3-rd order tensor and n <m < I. 

Recently, tri-linear PCA problem was found to be very useful in many practical problems. For 
instance, Wang and Ahuja [46] proposed a tensor rank-one decomposition algorithm to compress 
image sequence, where they essentially solve a sequence of tri-linear PCA problems. 

By the Cauchy-Schwarz inequality, the problem (35) is equivalent to 

max ||J"(x,y,-)|| max || J"(x, y, 

s.t. x G R", ||x|| = 1, <;=^ s.t. X G R", ||x|| = 1, 
yGR'",||y|| = 1, yGR"M|y|| = l. 

We further notice 

I 

\T(x,y,-)f' = T{x,y,-)^ F{x,y,-) ='^TijkF'uvkXiyjXuyv 

k=l 

e £ 
~ ^ j F jyk -F ujk XiyvXuUj — ^ '] F ujk Fiyj^ XuyjXiy^. 

k=l k=l 



(35) 
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Therefore, we can find a partial symmetric tensor Q with 

Qijuv ^ ^ {J~ ijk -F uvk ~l~ ^ivk ^ujk ~l~ ^ujk J'ivk) /S, V i, j, U, V, 
k=l 

such that y, = Q{x,y,x,y). Hence, the tri-hnear problem (35) can be equivalently 

formulated in the form of problem (32), which can be solved by the method proposed in the 
previous subsection. 



7.3 Quadri-linear tensor PCA 



In this subsection, we consider the following quadri-linear PCA problem: 

max T{x^ ,x'^,x^,x'^) 



n n (36) 

s.t. X* G R"s = 1, Vi = 1,2,3,4, 



T{x^,x'^,x^,x'^) to a bi-quadratic function T (%,^4,%,^4) with T being partial symmetric. To 



where G j^n.ix - xn4 ^j^]^ ?^l < n.3 < n2 < n^. Let us first convert the quadri-linear function 

) to a bi-quadratic 1 
this end, we first construct Q with 

Qii,i2 ,n+i3 ,n+i4 



'F iijijsji^ if 1 ^ ^fc ^ ^k 

0, otherwise. 



Consequently, we have J-'{x ) = ^ I ^3 , ^4 , ^3 , ^4 I . Then we can further partial-symmetrize 

Q and the desired tensor T is as follows, 

'^1*2*3*4 — ^ (^H*2*3*4 ~^ Giii4,i-ii2 ~l~ ^i3*2*li4 ~^ ^43*4*1*2) ^ ^1' ^2] ^3) ^4) 

(1 2 1 2\ /I 2 1 2\ 

^3 , !1;4 , !^3 , ^4 j = ^ ( x*' x3' x4 )• Therefore, problem (36) is now reformulated as 

a bi-quadratic problem: 



19 19 

max / I ^3 , ^4 1 ^3 ) 3,4 



S.t. x* gR"',||x*|| = 1, Vi = 1,...,4. 

Moreover, we can show that the above problem is actually a bi-quadratic problem in the form 
of (32). 

Proposition 7.2 Suppose T is a fourth order partial symmetric tensor. Then problem (37) is 
equivalent to 

I 1 \ 

S.t. Vlla^'f + = 

Vl|x2||2 + ||x4||2 = ^. 
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Proof. It is obvious that (38) is a relaxation of (37). To further prove that the relaxation (38) is 
tight, we assume (x^, x^, x^,x^) is optimal to (38). Then T i^s , ^4, ^3 , = T{x^,x'^,x^,x'^) > 0, 
and so X* 7^ for all i. Moreover, notice that 



I *^ "^11^ I I I I I ^ / I I 2112 j I I I I ^ 



\/||x^||||x-^|| < \l = 1 and \/||x2||||x4|| < 

Thus 



^1 ^2 '-'^ '•4 

/y« -L rv**-' /ya^ rY^~ 

ry- I 11^'* II ||^'~|| 11-^'* II ll-^'ll I ' 

' ' ' I \ l|x-^|| ' llx^ll ' IIX'^II ' Wx"^ 

x'^ II llx^ll llz'^ II ||£''" 

1 ~2 -3 



F{xr,x^,r^,x^) 



> -F(x^x^x^x^) 

= r 



^1 ^2 -1 -2 
tJO tJC wlj 

tic 

To summarize, we have found a feasible solution ^||fT|| , ]|f^' nf^' l|i^) (^''')' '^hich is optimal 
to its relaxation (38) and thus this relaxation is tight. □ 

1 \ / 2 



By letting V = x^^.'^j-, ^ = yx'^j and using some scaling technique, we can see that problem (38) 
share the same solution with 

max T{y,z,y,z) 
s.t. \\y\\ = 1, 

which was studied in Subsection 7.1. 



7.4 Even order multi-linear PCA 

The above discussion can be extended to the even order multi-linear PCA problem: 



max J^(x-^, x^, • • • ,x^'^) 

s.t. x'' G R"% llx^ll = 1, V i = 1, 2, . . . , 2d, 

where F G ]^"^x---x"^''^ immediate relaxation of (39) is the following 

max J"(x^, x^, • • • , x^'^) 



(39) 



hd (40) 
s.t. X* G R"% Wf] ||x^||2 = V2d. 



The following result shows that these two problems are actually equivalent. 
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Proposition 7.3 It holds that problem (39) is equivalent to (40). 



Proof. It suffices to show that relaxation (40) is tight. To this end, suppose (x^,-- - ,x^'^) is an 
optimal solution of (40). Then T{x^,x'^, - ■ ■ ,x^'^) > and so 7^ for i = l,...,2d. We also 
notice 



2d 



1 

2d 

12 1 < 



2d 



\\x'f/2d = 1. 



2d 

Consequently, Y\ W^iW ^ 1 

i=l 



-1 -2 

X X 



-2d 



I 1 1 ' II ^ 1 1 ' ' 1 1 oc^^ I 



U ^2 ... J.2d^ 



T{x ,x 



> J-[x ,x , • • • ,x 



2d 

n \\xi 
1=1 



2d\ 



Therefore, we have found a feasible solution 
implying that the relaxation is tight. 



I ' ' Il'v2d| 



of (39), which is optimal to (40) 

□ 



We now focus on (40). Based on we can construct a larger tensor Q as follows 



Gil- 



k-1 k fe-1 

ji-.-j^a, a 1 + rie < ik < J2 and jk = ik - ^ m 
e=i 1=1 1=1 



0, otherwise. 



By this construction, we have 



T{x',x\--- ,x"') = g{y^_^) 

2d 

with y = {{x^)'^ , (x^)""", • • • , {x'^'^)~^)~^ . We can further symmetrize Q and find a super-symmetric T 
such that 



71, 



1 



2d 



Jl---j2dG7r(4i---t2<i) 



£=1 



and 



T{yr^) = G{yr^)=Hx,x\--- ,x'^). 

2d 2d 

Therefore, problem (40) is equivalent to 

max T{ y, ■ - , y ) 

2d 

s.t. Ilyll = ^/2d, 
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(41) 



which is further equivalent to 

max T{ y, • - , y) 

2d 

s.t. ||y|| = 1. 

Thus the methods we developed for solving (6) can be applied to solve (39). 



7.5 Odd degree tensor PCA 

The last problem studied in this section is the following odd degree tensor PCA problem: 

max T{x, ■ ■ ■ ,x) 

2d+l 
S.t. ||x|| = 1, 

where J-" is a {2d + l)-th order super-symmetric tensor. As the degree is odd, 

max 7^(3;, • • • , x) = max I J^(x, • • • , x)| = ^ max \T{x^ , ■ ■ ■ ,x'^'^~^^)\, 

2d+l 2d+l ^ ' 

where the last identity is due to Corollary 4.2 in [7]. The above formula combined with the fact 
that 

max I J^(x, • • • ,x)\ < max \J^{x,--- ,x,y)\ < max \T{x^,--- ,x'^'^~^^)\ 

2d+l ' 2d 

implies 

max J^(x, • • • , x) = max \T{x, ■ ■ ■ , x,y)\ = max T{x,- ■ ■ ,x,y). 

2d+l '2d '2d 

By using the similar technique as in Subsection 7.2, problem (41) is equivalent to an even order 
tensor PCA problem: 

max Q{x, - ■ ■ ,x) 
Ad 

s.t. ||x|| = 1, 

where Q is super-symmetric with 

'^=1 \jl---j4dG'r(n---«4d) 
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8 Conclusions 



Tensor principal component analysis is an emerging area of research with many important appli- 
cations in image processing, data analysis, statistical learning, and bio- informatics. In this paper 
we introduced a new matricization scheme, which ensures that at the rank-one tensor (in the sense 
of CP rank) is equivalent to the rank-one matrix. This enables one to apply the methodology 
in compressive sensing, in particular the Li-norm optimization technique. As it turns out, this 
approach yields a rank-one solution with a high probability. This effectively solves the tensor PCA 
problem by convex optimization, at least for randomly generated problem instances. The resulting 
convex optimization model is still large in general. We proposed to use the first-order method such 
as the ADM method, which turns out to be very effective in this case. 
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Inst. # 


Nuclear Norm Penalty (15) 


SDP (16) 




Sol.Dif. 


Val.Dif. 


Tadmm 


Tcvx 


Sol.Dif. 


Val.Dif. 


Tadmm 


Tcvx 








Dimension n = 


= 7 








1 


4.10e-005 


1.89e-007 


1.00 


32.66 


7.05e-005 


4.20e-006 


0.79 


2.44 


2 


1.62e-004 


4.19e-006 


0.91 


35.32 


1.49e-004 


2.31e-008 


0.64 


2.63 


3 


1.50e-004 


5.46e-006 


1.00 


36.78 


9.33e-005 


4.46e-006 


0.61 


2.49 


4 


1.65e-004 


4.05e-006 


0.98 


33.57 


7.15e-005 


4.17e-006 


0.63 


2.48 


5 


4.77e-005 


4.36e-006 


0.95 


32.05 


1.36e-005 


4.27e-006 


0.65 


2.40 


6 


7.07e-005 


4.64e-006 


0.95 


31.17 


7.83e-005 


4.32e-006 


0.62 


2.40 


7 


5.31e-005 


4.55e-006 


0.87 


28.72 


2.42e-005 


3.89e-006 


0.64 


2.42 


8 


5.77e-005 


3.26e-007 


0.88 


38.33 


1.40e-004 


8.58e-007 


0.67 


2.38 


9 


1.53e-004 


4.29e-006 


0.95 


32.39 


6.07e-005 


3.85e-007 


0.68 


2.43 


10 


6.15e-005 


4.13e-006 


0.98 


33.29 


2.58e-005 


1.97e-007 


0.69 


2.43 








Dimension n = 


= 8 








1 


3.90e-005 


4.23e-006 


1.87 


180.08 


1.72e-004 


3.39e-008 


1.31 


4.86 


2 


8.31e-005 


5.06e-006 


1.85 


183.90 


1.09e-004 


5.44e-006 


1.15 


4.77 


3 


5.58e-005 


4.97e-006 


1.79 


176.69 


3.47e-005 


7.18e-008 


1.27 


5.93 


4 


4.23e-005 


4.13e-006 


1.82 


170.36 


9.11e-005 


9.36e-008 


1.21 


4.49 


5 


9.12e-005 


2.02e-007 


1.69 


151.61 


9.27e-005 


9.51e-007 


1.12 


4.51 


6 


3.94e-005 


4.14e-006 


1.67 


185.01 


1.59e-004 


4.93e-006 


1.20 


4.66 


7 


2.88e-005 


5.32e-006 


1.74 


151.82 


2.46e-005 


4.65e-006 


1.17 


4.50 


8 


7.70e-005 


4.36e-006 


1.77 


176.79 


5.30e-005 


1.31e-007 


1.24 


4.70 


9 


7.62e-005 


2.11e-006 


1.46 


199.26 


3.47e-004 


8.41e-007 


0.47 


4.62 


10 


1.38e-005 


4.88e-006 


1.70 


167.18 


7.62e-005 


4.60e-006 


1.18 


4.78 








Dimension n - 


= 9 








1 


6.80e-005 


4.74e-006 


3.42 


692.29 


3.71e-004 


1.51e-007 


2.47 


8.77 


2 


1.61e-004 


4.65e-006 


3.40 


736.90 


1.57e-004 


4.77e-006 


2.19 


8.70 


3 


1.47e-004 


4.71e-006 


3.33 


672.06 


1.52e-004 


1.25e-007 


2.30 


8.83 


4 


4.92e-005 


1.33e-007 


3.02 


668.35 


1.19e-004 


3.42e-007 


2.31 


8.89 


5 


5.94e-005 


4.86e-006 


3.26 


703.02 


5.78e-005 


1.57e-007 


2.26 


8.85 


6 


l.lle-004 


5.29e-006 


3.40 


789.86 


5.54e-005 


5.95e-006 


2.29 


8.70 


7 


6.59e-005 


l.Ole-007 


3.29 


683.18 


1.08e-004 


6.15e-007 


2.28 


8.98 


8 


9.82e-005 


6.07e-007 


2.23 


728.11 


1.67e-004 


7.70e-007 


0.94 


9.19 


9 


1.28e-004 


5.32e-006 


3.27 


750.91 


9.81e-005 


4.60e-007 


1.96 


8.97 


10 


5.00e-005 


5.09e-006 


3.33 


611.53 


8.89e-005 


7.15e-007 


2.17 


8.81 








Dimension n — 


10 








1 


3.14e-005 


5.66e-006 


6.15 


3092.51 


2.55e-005 


5.88e-006 


4.17 


22.18 


2 


7.14e-005 


5.78e-006 


6.02 


2906.35 


9.97e-005 


5.78e-007 


3.81 


23.58 


3 


3.94e-005 


5.17e-006 


6.11 


3065.38 


4.05e-005 


5.44e-006 


4.21 


24.01 


4 


7.33e-005 


6.15e-006 


6.04 


3042.79 


9.31e-005 


1.79e-007 


4.33 


23.96 


5 


8.91e-005 


5.34e-006 


6.30 


3158.84 


5.80e-005 


5.81e-006 


4.36 


26.68 


6 


3.77e-005 


6.01e-006 


6.10 


2933.19 


8.58e-005 


5.69e-006 


4.36 


22.62 


7 


2.05e-004 


2.90e-007 


6.40 


2908.58 


8.87e-005 


1.38e-008 


4.02 


30.01 


8 


8.48e-005 


6.63e-006 


6.14 


3927.01 


1.02e-004 


3.68e-008 


4.07 


27.78 


9 


3.73e-005 


6.42e-006 


6.05 


3086.98 


5.88e-005 


1.47e-007 


4.06 


22.86 


10 


6.81e-005 


6.11e-006 


6.08 


3380.49 


1.91e-004 


6.45e-006 


3.91 


24.82 
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Inst. # 


SDP 


NNP 




Sol.Dif. 


Val.Dif. 


Tadmm 


Tcvx 


Sol.Dif.Ds 


Val.Dif.DS 


Tadmm 


Dimension n — 14 


1 


1.76e- 


004 


8.26e- 


006 


31.95 


188.98 


1.76e- 


004 


8.77e- 


006 


53.06 


2 


4.37e- 


004 


2.35e- 


007 


30.86 


187.38 


4.37e- 


004 


8.17e- 


006 


44.70 


3 


2.40e- 


004 


8.19e- 


006 


28.42 


192.68 


2.40e- 


004 


8.12e- 


006 


48.06 


4 


1.24e- 


004 


8.17e- 


006 


32.94 


198.02 


1.24e- 


004 


8.28e- 


006 


51.91 


5 


3.87e- 


004 


8.16e- 


006 


34.90 


189.34 


3.87e- 


004 


8.04e- 


006 


48.87 


6 


4.53e- 


005 


8.63e- 


006 


24.86 


201.61 


4.49e- 


005 


8.39e- 


006 


50.92 


7 


6.71e- 


004 


8.38e- 


006 


30.25 


193.43 


6.71e- 


004 


8.65e- 


006 


51.50 


8 


1.61e- 


004 


7.34e- 


006 


32.05 


198.15 


1.61e- 


004 


4.95e- 


007 


47.14 


9 


9.40e- 


005 


8.43e- 


006 


37.79 


199.00 


9.41e- 


005 


8.18e- 


006 


54.02 


10 


1.67e- 


004 


7.44e- 


006 


31.60 


191.67 


1.67e- 


004 


5.40e- 


007 


47.96 


Dimension n — 16 


1 


4.03e- 


005 


8.74e- 


006 


68.11 


732.87 


4.05e- 


005 


9.29e- 


006 


147.47 


2 


1.72e- 


004 


9.31e- 


006 


71.91 


684.05 


1.71e- 


004 


9.76e- 


006 


180.08 


3 


5.38e- 


005 


9.07e- 


006 


91.38 


687.36 


5.39e- 


005 


9.79e- 


006 


166.88 


4 


1.46e- 


004 


9.26e- 


006 


79.85 


689.29 


1.46e- 


004 


2.87e- 


008 


166.12 


5 


8.27e- 


005 


8.85e- 


006 


75.19 


682.30 


8.22e- 


005 


3.59e- 


007 


181.18 


6 


3.17e- 


005 


9.35e- 


006 


80.15 


703.35 


3.17e- 


005 


9.34e- 


006 


179.90 


7 


5.66e- 


005 


1.88e- 


007 


82.60 


696.51 


5.73e- 


005 


9.17e- 


006 


171.28 


8 


2.30e- 


004 


2.56e- 


007 


78.83 


702.43 


2.30e- 


004 


2.89e- 


007 


167.25 


9 


9.24e- 


005 


9.48e- 


006 


77.83 


735.89 


9.26e- 


005 


9.79e- 


006 


152.90 


10 


3.37e- 


004 


9.79e- 


006 


79.17 


706.87 


3.37e- 


004 


2.67e- 


007 


155.83 


Dimension n = 18 


1 


1.09e- 


004 


3.34e- 


007 


104.48 


2254.16 


1.09e- 


004 


2.61e- 


007 


220.93 


2 


4.41e- 


004 


1.13e- 


005 


158.64 


2016.83 


4.41e- 


004 


8.69e- 


008 


281.94 


3 


1.55e- 


004 


2.68e- 


007 


191.92 


2249.16 


1.55e- 


004 


1.97e- 


007 


338.29 


4 


1.85e- 


004 


1.09e- 


005 


177.42 


2004.44 


1.85e- 


004 


4.31e- 


007 


295.31 


5 


5.03e- 


005 


1.07e- 


005 


169.82 


2079.49 


5.03e- 


005 


1.07e- 


005 


266.01 


6 


1.37e- 


004 


1.29e- 


007 


166.35 


2039.80 


1.37e- 


004 


4.09e- 


007 


260.92 


7 


4.13e- 


004 


1.09e- 


005 


167.65 


2054.42 


4.13e- 


004 


1.09e- 


005 


280.23 


8 


2.85e- 


004 


1.39e- 


006 


48.50 


2383.20 


2.85e- 


004 


1.08e- 


005 


159.33 


9 


5.89e- 


005 


l.OOe- 


005 


163.10 


2259.23 


5.95e- 


005 


1.07e- 


005 


270.30 


10 


1.49e- 


004 


1.07e- 


005 


173.84 


2187.49 


1.49e- 


004 


1.02e- 


005 


290.24 


Dimension n = 20 


1 


5.58e- 


004 


6.70e- 


007 


389.27 


5978.72 


5.58e- 


004 


1.21e- 


005 


644.93 


2 


2.30e- 


004 


1.33e- 


007 


350.43 


6040.70 


2.30e- 


004 


1.54e- 


007 


636.87 


3 


3.66e- 


004 


1.23e- 


005 


372.13 


5245.99 


3.66e- 


004 


1.19e- 


005 


649.96 


4 


1.17e- 


004 


2.89e- 


007 


357.42 


5529.12 


1.18e- 


004 


1.19e- 


005 


689.26 


5 


5.58e- 


004 


6.70e- 


007 


387.78 


5575.93 


5.58e- 


004 


1.21e- 


005 


641.90 


6 


8.80e- 


005 


1.52e- 


007 


385.76 


6284.73 


8.88e- 


005 


1.17e- 


005 


641.87 


7 


1.31e- 


004 


1.22e- 


005 


388.05 


5812.31 


1.30e- 


004 


1.22e- 


007 


726.95 


8 


4.55e- 


004 


1.18e- 


005 


367.04 


5305.20 


4.55e- 


004 


2.17e- 


008 


629.76 


9 


5.88e- 


004 


1.40e- 


007 


389.02 


5186.03 


5.88e- 


004 


1.96e- 


007 


670.42 


10 


4.47e- 


004 


2.38e- 


007 


359.13 


5559.39 


4.47e- 


004 


1.14e- 


005 


695.63 



Table 4: Comparison of CVX and ADMM for large-scale problems 
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Inst. # 


GLP 


SDP by ADMM 




Val. 


Time 


Status 


Val. 


Time 


Sol.R. 






Dimension n — 


14 






1 


5.59 


108.65 


1 


5.59 


29.18 


1 


2 


5.83 


90.36 


1 


5.83 


32.53 


1 


3 


5.18 


116.07 


1 


5.18 


9.88 


1 


4 


5.32 


101.22 


1 


5.32 


30.82 


1 


5 


5.82 


99.11 


1 


5.82 


30.86 


1 


6 


5.71 


108.86 


1 


5.71 


30.57 


1 


7 


5.91 


108.60 


1 


5.91 


28.94 


1 


8 


5.69 


110.22 


1 


5.69 


31.03 


1 


9 


6.89 


83.20 


1 


6.89 


31.49 


1 


10 


5.96 


103.31 


1 


5.96 


29.58 


1 






Dimension n — 


16 






1 


6.53 


434.68 


1 


6.53 


93.02 


1 


2 


6.65 


358.54 


1 


6.65 


102.13 


1 


3 


6.28 


401.87 


1 


6.28 


92.15 


1 


4 


6.58 


421.67 


1 


6.58 


92.17 


1 


5 


6.26 


383.33 





6.26 


92.12 


1 


6 


6.80 


429.70 


1 


6.80 


86.28 


1 


7 


6.00 


453.46 


1 


6.00 


79.87 


1 


8 


6.73 


314.64 


1 


6.72 


93.27 


1 


9 


6.34 


383.75 


1 


6.34 


79.20 


1 


10 


6.66 


381.94 


1 


6.66 


92.71 


1 






Dimension n — 


18 






1 


6.65 


1279.10 


1 


6.65 


152.64 


1 


2 


6.37 


1619.29 


1 


6.37 


84.49 


1 


3 


6.97 


1393.22 


1 


6.97 


155.34 


1 


4 


7.00 


1418.60 


1 


7.00 


197.82 


1 


5 


7.49 


1289.53 


1 


7.49 


173.35 


1 


6 


6.99 


1463.83 


1 


6.99 


178.05 


1 


7 


6.77 


1476.79 


1 


6.77 


180.71 


1 


8 


6.87 


1426.56 


1 


6.87 


165.64 


1 


9 


6.97 


1363.82 


1 


6.97 


176.70 


1 


10 


6.73 


1317.64 





6.73 


206.81 


1 






Dimension n — 


20 






1 


7.40 


8614.12 


1 


7.40 


317.03 


1 


2 


7.33 


9203.73 


1 


7.33 


357.46 


1 


3 


7.21 


8548.82 


1 


7.21 


374.20 


1 


4 


7.32 


7866.87 


1 


7.32 


364.95 


1 


5 


7.28 


8641.94 


1 


7.28 


347.89 


1 


6 


7.02 


6725.36 





7.02 


382.33 


1 


7 


7.40 


12273.17 


1 


7.40 


389.29 


1 


8 


6.93 


13298.42 


1 


6.93 


328.74 


1 


9 


7.33 


12274.62 


1 


7.33 


363.64 


1 


10 


6.68 


8411.78 


1 


6.68 


386.33 


1 



Table 5: Comparison between GloptiPoly 3 and the 
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SDP Relaxation by ADMM with ^ 



= 0.5. 



Dim (n, m) 


rank-1 


CPU 


(4,4) 


100 


0.12 


(4,6) 


100 


0.25 


(6,6) 


100 


0.76 


(6,8) 


100 


1.35 


(8,8) 


98 


2.30 


(8,10) 


100 


3.60 


(10,10) 


96 


5.77 



Table 6: Frequency of problem (34) having rank-one solution 
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